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Abstract — The smart grid vision is to build an intelligent power 
network with an unprecedented level of situational awareness 
and controllability over its services and infrastructure. This 
paper advocates statistical inference methods to robustify power 
monitoring tasks against the outlier effects owing to faulty 
readings and malicious attacks, as well as against missing data 
due to privacy concerns and communication errors. In this 
context, a novel load cleansing and imputation scheme is developed 
leveraging the low intrinsic-dimensionality of spatiotemporal load 
profiles and the sparse nature of "bad data." A robust estimator 
based on principal components pursuit (PCP) is adopted, which 
effects a twofold sparsity-promoting regularization through an 
^i-norm of the outliers, and the nuclear norm of the nominal 
load profiles. Upon recasting the non-separable nuclear norm 
into a form amenable to decentralized optimization, a distributed 
(D-) PCP algorithm is developed to carry out the imputation 
and cleansing tasks using networked devices comprising the so- 
termed advanced metering infrastructure. If D-PCP converges 
and a qualification inequality is satisfied, the novel distributed 
estimator provably attains the performance of its centralized PCP 
counterpart, which has access to all networkwide data. Computer 
simulations and tests with real load curve data corroborate the 
convergence and effectiveness of the novel D-PCP algorithm. 

Index Terms — Advanced metering infrastructure, distributed 
algorithms, load curve cleansing and imputation, principal com- 
ponents pursuit, smart grid. 



I. Introduction 

The US power grid has been recognized as the most 
important engineering achievement of the 20th century ||26) . 
yet it presently faces major challenges related to efficiency, 
reliability, security, environmental impact, sustainability, and 
market diversity issues ll25l . The crystallizing vision of the 
smart grid (SG) aspires to build a cyber-physical network that 
can address these challenges by capitalizing on state-of-the-art 
information technologies in sensing, control, communication, 
optimization, and machine learning. Significant effort and 
investment are being committed to architect the necessary 
infrastructure by installing advanced metering systems, and 
establishing data communication networks throughout the grid. 
Accordingly, algorithms that optimally exploit the pervasive 
sensing and control capabilities of the envisioned advanced 
metering infrastructure (AMI) are needed to make the neces- 
sary breakthroughs in the key problems in power grid monitor- 
ing and energy management. This is no easy endeavor though, 
in view of the challenges posed by increasingly distributed 
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network operations under strict reliability requirements, also 
facing malicious cyber-attacks. 

Statistical inference techniques are expected to play an in- 
creasingly instrumental role in power system monitoring llTol . 
not only to meet the anticipated "big data" deluge as the 
installed base of phasor measurement units (PMUs) reaches 
out throughout the grid, but also to robustify the monitoring 
tasks against the "outlier" effects owing to faulty readings, 
malicious attacks, and communication errors, as well as 
against missing data due to privacy concerns and technical 
anomalies m. In this context, a load cleansing and imputation 
scheme is developed in this paper, building on recent advances 
in sparsity-cognizant information processing [fl2l, low-rank 
matrix completion [9|, and large-scale distributed optimiza- 
tion E|. 

Load curve data refers to the electric energy consumption 
periodically recorded by meters at points of interest across the 
power grid, e.g., end-user premises, buses, and substations. 
Accurate load profiles are critical assets aiding operational 
decisions in the envisioned SG system jT], and are essential 
for load forecasting ll24l . However, in the process of acquir- 
ing and transmitting such massive volumes of information 
for centralized processing, data are oftentimes corrupted or 
lost altogether In a smart monitoring context for instance, 
incomplete load profiles emerge due to three reasons: (rl) 
PMU-instrumented buses are few; (r2) SCADA data become 
available at a considerably smaller time scale than PMU 
data; and (r3) regional operators are not willing to share all 
their variables ||231 . Moreover, a major requirement for grid 
monitoring is robustness to outliers, i.e., data not adhering 
to nominal models HI, 1221 . Sources of so-termed "bad 
data" include meter failures, as well as strikes, unscheduled 
generator shutdowns, and extreme weather conditions ||7], 
i fTTI . Inconsistent data can also be due to malicious (cyber- 
) attacks that induce abrupt load changes, or counterfeit meter 
readings 1181 . 

In light of the aforementioned observations, the first con- 
tribution of this paper is on modeling spatiotemporal load 
profiles, accounting for the structure present to effectively 
impute missing data and devise robust load curve estima- 
tors stemming from convex optimization criteria (Section |ll|. 
Existing approaches to load curve cleansing have relied on 
separate processing per time series Q, llTTI . iBTI . and have 
not capitalized on spatial correlations to improve performance. 
The aim is for minimal-rank cleansed load data, while also 
exploiting outlier sparsity across buses and time. An estimator 
tailored to these specifications is principal components pursuit 
(PC?) 121, 0, El, which is outhned in Section Uni PCP 
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minimizes a tradeoff between the least-squares (LS) model 
fitting error and a twofold sparsity-promoting regularization, 
implemented through an ^i-norm of the outliers and the 
nuclear norm of the nominal load profiles. While PCP has been 
widely adopted in computer vision Q, for voice separation in 
music [13], and unveiling network anomalies 1201 . its benefits 
to power systems engineering and monitoring remains so 
far largely unexplored. The second contribution pertains to 
developing a distributed (D-) PCP algorithm, to carry out the 
imputation and cleansing task using a network of intercon- 
nected devices as part of the AMI (SectionUvTi. This is possible 
by leveraging a general algorithmic framework for sparsity- 
regularized rank minimization put forth in ll20l . Upon recasting 
the (non-separable) nuclear norm present in the PCP cost 
into a form amenable to decentralized optimization (Section 
lIV-Al i. the D-PCP iterations are obtained in Section IIV-CI via 
the multi-block alternating-directions method of multipliers 
(AD MM) solver |[3l, H. In a nutshell, per iteration each smart 
meter exchanges simple messages with its (directly connected) 
neighbors in the network, and then solves its own optimization 
problem to refine its current estimate of the cleansed load 
profile. In the context of power systems, the ADMM has 
been recently adopted to carry out dynamic network energy 
management |16], and distributed robust state estimation |[T4l . 

Computer simulations corroborate the convergence and op- 
timality of the novel D-PCP algorithm, and demonstrate its 
effectiveness in cleansing and imputing real load curve data 
(Section [V]i. Concluding remarks and directions for future 
research are outlined in Section |VI] while a few algorithmic 
details are deferred to the Appendix. 

II. Modeling and Problem Statement 

This section introduces the model for (possibly) incomplete 
and grossly corrupted load curve measurements, acquired by 
geographically-distributed metering devices monitoring the 
power grid. The communication network model needed to 
account for exchanges of information among smart meters is 
described as well. Lastly, the task of load curve cleansing and 
imputation is formally stated. 

A. Spatiotemporal load curve data model 

Let the x 1 vector y{t) := [yi^t, ■ ■ ■ ,yN.t]' (' stands 
for transposition) collect the spatial load profiles measured by 
smart meters monitoring network nodes (buses, residential 
premises), at a given discrete-time instant t G [1, T]. Consider 
the A^ X r matrix of observations Y [y(l), . . . , y(T)]. The 
71-th row (y„)' of Y is the time series of energy consumption 
(load curve) measurements at node n, while the t-th column 
y(t) of Y represents a snapshot of the networkwide loads 
taken at time t. To model missing data, consider the set 
fl C {1, . . . , A^} X {1, . . . , T} of index pairs (n, t) defining a 
sampling of the entries of Y. Introducing the matrix sampling 
operator Pn{-), which sets the entries of its matrix argument 
not indexed by ft to zero and leaves the rest unchanged, the 
(possibly) incomplete spatiotemporal load curve data in the 
presence of outliers can be modeled as 

n2(Y) = n2(x + o + E) (1) 



where X, O, and E denote the nominal load profiles, the out- 
liers, and small measurement errors, respectively. For nominal 
observations i/n.t = Xn.t + en,t, one has o„_t = 0. 
Remark 1 (Model under-determinacy): The model is in- 
herently under-determined, since even for the (most favorable) 
case of full data, i.e., ee {1, . . . , A^} x {1, . . . , T}, there are 
twice as many unknowns in X and O as there is data in Y. 
Estimating X and O becomes even more challenging when 
data are missing, since the number of unknowns remains the 
same, but the amount of data is reduced. 

In any case, estimation of {X, O} from Pq{Y) is an ill- 
posed problem unless one introduces extra structural assump- 
tions on the model components to reduce the effective degrees 
of freedom. To this end, two cardinal properties of X and 
O will prove instrumental. First, common temporal patterns 
among the energy consumption of a few broad classes of 
loads (e.g., industrial, residential, seasonal) in addition to their 
(almost) periodic behaviors render most rows and columns 
of X linearly dependent, and thus X typically has low-rank. 
Second, outliers (or attacks) only occur sporadically in time 
and affect only a few buses, yielding a sparse matrix O. 
Smoothness of the nominal load curves is related to the low- 
rank property of X, which was adopted in Q to motivate 
a smoothing splines-based algorithm for cleansing. However, 
while in Q smoothness was enforced per load profile time- 
series, i.e., per row (x„)' of X; the low-rank property of 
X also captures the spatial dependencies introduced by the 
network. Approaches capitalizing on outUer- and "bad data-" 
sparsity can be found in e.g., lfT4l . ifTSl and ||2T1 . 

B. Communication network model 

Suppose that on top of the energy measurement functional- 
ity, the A^ networked smart meters are capable of performing 
simple local computations, as well as exchanging messages 
among directly connected neighbors. Single-hop communica- 
tion models are appealing due to their simplicity, since one 
does not have to incur the routing overhead. The AMI network 
is naturally abstracted to an undirected graph G{J\f, £), where 
the vertex set Af := {1, . . . , A^} corresponds to the network 
nodes, and the edges (links) in C represent pairs of nodes that 
are connected via a physical communication channel. Node 
n € J\f communicates with its single-hop neighboring peers 
in J'n, and the size of the neighborhood will be henceforth 
denoted by \J'n\- The graph G is assumed connected, i.e., there 
exists a (possibly multihop) path that joins any pair of nodes 
in the network. This requirement ensures that the network 
is devoid of multiple isolated (connected) components, and 
allows for the data collected by e.g., smart meter n, namely 
the n-th row (y,i)' of Y, to eventually reach every other node 
in the network. This way, even when only local interactions 
are allowed, the flow of information can percolate the network. 

The importance of the network model will become apparent 
in Section |IV] 

C. Load curve cleansing and imputation 

The load curve cleansing and imputation problem studied 
here entails identification and removal of outliers (or "bad 
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data"), in addition to completion of the missing entries from 
the nominal load matrix, and denoising of the observed 
ones. To some extent, it is a joint estimation-interpolation 
(prediction)-detection problem. With reference to ([T), given 
generally incomplete, noisy and outlier-contaminated spa- 
tiotemporal load data Pq{Y), the cleansing and imputation 
tasks amount to estimating the nominal load profiles X and 
the outliers O, by leveraging the low-rank property of X and 
the sparsity in O. Collaboration between metering devices 
(collecting networkwide data) is considered here, rather than 
local processing of load curves per bus. 

Note that load cleansing and imputation are different from 
load forecasting ll24l . which amounts to predicting future load 
demand based on historical data of energy consumption and 
the weather conditions. Actually, cleansing and imputation are 
critical preprocessing tasks utilized to enhance the quality of 
load data, that would be subsequently used for load forecasting 
and optimum power flow [2|. 



to a central monitoring and data analytics station, which 
uses their aggregation in 7'q(Y) to reject outliers and im- 
pute missing data. While for the most part this is the pre- 
vailing operational paradigm nowadays, there are limitations 
associated with this architecture. For instance, collecting all 
this information centrally may lead to excessive overhead 
in the communication network, especially when the rate of 
data acquisition is high at the meters. Moreover, minimizing 
(or avoiding altogether) the exchanges of raw measurements 
may be desirable for privacy and cyber-security reasons, as 
well as to reduce unavoidable communication errors that 
translate to missing data. Performing the optimization in a 
centralized fashion raises robustness concerns as well, since 
the central data analytics station represents an isolated point 
of failure. These reasons motivate devising fully-distributed 
iterative algorithms for PCP, embedding the load cleansing 
and imputation functionality to the AMI. This is the subject 
of the next section. 



III. Prinicipal Components Pursuit 

An estimator matching nicely the specifications of the load 
curve cleansing and imputation problem stated in Section 
III-CI is the so-termed (stable) principal components pursuit 
(PCP) 121, IS, m, that will be outlined here for complete- 
ness. PCP seeks estimates {X, O} as the minimizers of 

(PI) min^ 1 llPolY - X - 0)||^ + A, |1X||, + Ai ||0|1, 

where the ^i-norm ||0||i := J2n t Wn,t\ and the nuclear norm 
||X||* := <^iO^) i'^iO^) denotes the i-th singular value of 
X) are utilized to promote sparsity in the number of outliers 
(nonzero entries) in O, and the low rank of X, respectively. 
The nuclear and -norms are the closest convex surrogates 
to the rank and cardinality functions, which albeit the most 
natural criteria they are in general NP-hard to optimize. The 
tuning parameters Ai , A* > control the tradeoff between 
fitting error, rank, and sparsity level of the solution. When 
an estimate of the observation noise variance is available, 
guidelines for selecting A* and Ai have been proposed in 1281 . 
The nonzero entries in O reveal "bad data" across both buses 
and time. Clearly, it does not make sense to flag outliers in 
data that has not been observed, namely for (n, t) ^ fl. In 
those cases (PI) yields 6„ ^ = since both the Frobenious 
and ^i-norms are separable across the entries of their matrix 
arguments. 

Being convex (PI) is computationally appealing, and it 
has been shown to attain good performance in theory and 
practice. For instance, in the absence of noise and when there 
is no missing data, identifiability and exact recovery conditions 
were reported in 15] and [|6]. Even when data are missing, 
it is possible to recover the low-rank component under some 
technical assumptions l5\. Theoretical performance guarantees 
in the presence of noise are also available ||281 . 

Regarding algorithms, a PCP solver based on the accelerated 
proximal gradient method was put forth in IITtI . while the 
ADMM was employed in (27l . Implementing these central- 
ized algorithms presumes that networked metering devices 
continuously communicate their local load measurements y„.t 



IV. Distributed Cleansing and Imputation 

A distributed (D-)PCP algorithm to solve (PI) using a 
network of smart meters (modeled as in Section III-Bb should 
be understood as an iterative method, whereby each node 
carries out simple local (optimization) tasks per iteration 
k = 1,2,..., and exchanges messages only with its directly 
connected neighbors. The ultimate goal is for each node to 
form local estimates x„ [k] and o„ [fc] that coincide with the n- 
th rows of X and O as fc — > oo, where {X, O} is the solution 
of (PI) obtained when all data Vn{Y) are centrally available. 
Attaining the centralized performance with distributed data is 
impossible if the network is disconnected. 

To facilitate reducing the computational complexity and 
memory storage requirements of the D-PCP algorithm sought, 
it is henceforth assumed that an upper bound rank(X) < p 
is a priori available [recall X is the estimated low-rank 
cleansed load profile obtained via (PI)]. As argued next, 
the smaller the value of p, the more efficient the algorithm 
becomes. Small values of p are well motivated due to the low 
intrinsic dimensionality of the spatiotemporal load profiles (cf. 
Section Hl-Al i. Because rank(X) < p, (Pl)'s search space is 
effectively reduced and one can factorize the decision variable 
as X = PQ', where P and Q are x p and T x p matrices, 
respectively. Adopting this reparametrization of X in (PI) and 
making explicit the distributed nature of the data (cf. Section 
III-BI ). one arrives at an equivalent optimization problem 



N 



(P2) 



mm 
{P,Q 



1 V 



1 



+^I|PQ'I 



+ Ai||o„ 



which is non-convex due to the bilinear term PQ', and where 
P := [pi, . . . , Pat]'. The number of variables is reduced from 
2NT in (PI), to p{N + T) + NT in (P2). The savings can 
be significant when p is small, and both N and T are large. 
Note that the dominant iVT-term in the variable count of (P2) 
is due to O, which is sparse and can be efficiently handled 
even when both N and T are large. 
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Remark 2 (Challenges facing distributed implementation): 

Problem (P2) is still not amenable for distributed 
implementation due to: (cl) the non-separable nuclear 
norm present in the cost function; and (c2) the global variable 
Q coupling the per-node summands. 

Challenges (cl)-(c2) are dealt with in the ensuing sections. 

A. A separable low-rank regularization 

To address (cl), consider the following alternative charac- 
terization of the nuclear norm (see e.g. ||201 ) 



|X||* mill \ 
{P,Q} 2 



Q 



II), 



s. to X = PQ'. 



(2) 

The optimization (|2]i is over all possible bilinear factorizations 
of X, so that the number of columns p of P and Q is also a 
variable. Leveraging (|2]i, the following reformulation of (P2) 
provides an important first step towards obtaining the D-PCP 
algorithm: 



N 

(P3) min V 
{p,Q,o}f-; 



2ll^o„(yn 



QPn - On)\\2 + Al||o„||l 



IQIIf 



As asserted in 11201 Lemma 1], adopting the separable 
Frobenius-norm regularization in (P3) comes with no loss of 
optimality relative to (PI), provided rank(X) < p. By finding 
the global minimum of (P3) [which could have considerably 
less variables than (PI)], one can recover the optimal solution 
of (PI). This could be challenging however, since (P3) is non- 
convex and it may have stationary points which need not be 
globally optimum. 

Interestingly, it is possible to certify global optimality of a 
stationary point {P, Q, O} of (P3). Specifically, one can estab- 
lish that if ||7'ji(Y-PQ'-0) || < A*, then {X := PQ',6 := 
O} is the globally optimal solution of (PI) EqI Prop. 1]. The 
qualification condition ||7'j2(Y — PQ' — 0)|| < A, captures 
tacitly the role of p. In particular, for sufficiently small p the 
residual ||7'o(Y — PQ' — 0)|| becomes large and consequently 
the condition is violated [unless A* is large enough, in which 
case a sufficiently low -rank solution to (PI) is expected]. The 
condition on the residual also implicitly enforces rank(X) < p, 
which is necessary for the equivalence between (PI) and (P3). 

B. Local variables and consensus constraints 

To decompose the cost in (P3), in which summands inside 
the square brackets are coupled through the global variable 
Q [cf. (c2) under Remark |2], introduce auxiliary variables 
{Q,n}n=i representing local estimates of Q per smart meter 
n. To obtain a separable PCP formulation, use these estimates 
along with consensus constraints 



N 



(P4) min 

{P„,Q„ 



^||^0„(yn - QnP« - 0„)||i 



+Ai||o„||i + ;^(iV||p„||^ + ||Q„|||) 



s. to Q„ = Q„i, m e Jn , n e TV. 



Notice that (P3) and (P4) are equivalent optimization prob- 
lems, since the network graph G{N , C) is connected by 
assumption. Even though consensus is a fortiori imposed only 
within neighborhoods, it extends to the whole (connected) 
network and local estimates agree on the global solution of 
(P3). To arrive at the desired D-PCP algorithm, it is convenient 
to reparametrize the consensus constraints in (P4) as 



Q„ = F™, Q„ = F™, and F™ 



m e Jm n <E Af 
(3) 



where {F™, F^jngA/". auxiliary optimization variables 
that will be eventually eliminated (cf. Remark |3]l. 



C. The D-PCP algorithm 

To tackle (P4), associate Lagrange multipliers M™ and M™ 
with the first pair of consensus constraints in (|3). Introduce the 
quadratically augmented Lagrangian function [31 



N 



I^f2„(y« - QnPn - 0„)||| 



A* 



+Ai||o„|Ii + ^(^||p„||^ + ||Q„|||) 



N 



((M™,Q„-F™) + (M™,Q„,-F™) 

n=l meJn 



N 



i^m 1 1 2 

\f 



nn-r:\m 



(4) 



n—l m£j'n 



where c > is a penalty parameter, and the primal variables 
are split into three groups Vi := {Q„}^=i, V2 := {p,i}^=i 



and V3 := {o„, F™ , F™}"^^^". For notational brevity, collect 
all Lagrange multipUers in M := {M;^, M;^'}™|^". Note 
that the remaining constraints in (|3), namely Cp := {F™ = 
F™, m & Jm n G A/"}, have not been dualized. 

To minimize (P4) in a distributed fashion, (a multi-block 
variant of) the ADMM will be adopted here. The ADMM 
is an iterative augmented Lagrangian method especially well 
suited for parallel processing ||3], ||4], which has been proven 
successful to tackle the optimization tasks stemming from gen- 
eral distributed estimators of deterministic and (non-)stationary 
random signals; see e.g., llT4l . Il20l and references therein. The 
proposed solver entails an iterative procedure comprising four 
steps per iteration fc = 1, 2, . . ., [n G A/", m G Jn in (IS])-©] 

[SI] Update dual variables: 



c(Q„[fc]~Fr;[fc]) (5) 
c(Q,„[fc] -F;r[fc]). (6) 



[S2] Update first group of primal variables: 

Vi [fc + 1] = arg min ( Vi , V2 [fc] , V3 [fc] MM)- (7) 

Vi 

[S3] Update second group of primal variables: 

V2[fc + 1] =ai-g min/:,(Vi[fc + l],V2,V3[fc],A^[A:]). (8) 

V2 
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Algorithm 1 : D-PCP at smart meter n e Af 
input y„,fl„,Xt,Xi, and c. 

initialize S[0] = Otxp, and Q,i[l]i Pn[l] at random, 
for k = l,2,... do 

Receive {Qm[fc]} from neigiibors m £ Jn. 

[SI] Update local dual variables: 

S„[fc] = S„[fc - 1] + cE,„6J„(Qn[fc] - Q,n[fc]). 

[S2] Update first group of local primal variables: 

A„[fc+1] - {(p„[fc]p;[fc]) ® n„ + {K/N + 2c|J-„|) V}"' 

Qn [fc + 1] = unvec ^ A„ [fc + 1] | (p„ [fc] (g) n„ ) (y„ - o„ [k] ) 

-vec(S„[fc]) + vec(cE„6j„(Qn[fc] + Q„.[fc]))}) . 
[S3] Update second group of local primal variables: 

Pn[fc + 1] = {Q;jfc + l]r2„Qn[fc + 1] + Ip}"' 

xQ;jfc + l]n„(y„ -o„[fc]). 
[S4] Update third group of local primal variables: 

o„[fc + l] =<Sai (fln(yn -Q„[fc + l]p„[A: + l])). 
Transmit Q,4fc + 1] to neighbors m £ J'n. 
end for 

return Q„ [oo] , p„ [cx>] , o„ [oo] . 



[S4] Update third group of primal variables: 

V3 [fc + 1] = arg mill /Ic ( Vi [fc + 1] , V2 [fc + 1] , V3 , X [A:] ) 

(9) 

which amount to a block-coordinate descent method cycling 
over Vi V2 — !■ V3 to minimize Lc, and dual variable 
updates 13J. At each step while minimizing the augmented 
Lagrangian, the variable groups not being updated are treated 
as fixed, and are substituted with their most up to date values. 
Different from the standard two-block ADMM ||3], |j4|, the 
multi-block variant here cycles over three groups of primal 
variables |fT9l . 

Reformulating the estimator (PI) to its equivalent form (P4) 
renders the augmented Lagrangian in (|4) highly decomposable. 
The separability comes in two flavors, both with respect to the 
variable groups V1-V3, as well as across the network nodes 
n G M. This leads to highly parallelized, simplified recursions 
to be run by the networked smart meters. Specifically, it is 
shown in the Appendix that the aforementioned ADMM steps 
[S1]-[S4] give rise to the D-PCP iterations tabulated under 
Algorithm [U Per iteration, each device updates: [SI] a local 
matrix of dual prices Sn[A:]; [S2]-[S3] local cleansed load 
estimates Q„[fc + 1] and p„[A; + 1] obtained as solutions 
to respective unconstrained quadratic problems (QPs); and 
[S4] its local outlier vector, through a sparsity-promoting soft- 
thresholding operation. The {k + l)-st iteration is concluded 
after smart meter n transmits Q„[A: + 1] to its single-hop 
neighbors in Jn- Regarding communication cost, Q„[fc + 1] is 
aT X p matrix and its transmission does not incur significant 
overhead for small p. Observe also that Vniyn) need not be 
exchanged which is desirable to preserve data secrecy, and the 
communication cost is independent of N. 

Before moving on, a clarification on the notation used in 
Algorithm[T]is due. To define matrix in [S2]-[S4], observe 
first that the local sampling operator can be expressed as 
Vn„ (z) = <^n z, where denotes Hadamard product, and 
the binary masking vector a;„ G {0, 1}"^ has entries equal to 
1 if the corresponding entry of z is observed, and otherwise. 



It is then apparent that the Hadamard product can be replaced 
with the usual matrix- vector product as Vn^ (z) = J^jiZ, where 
J7„ := diag(a;„). Operators (g) and vec[-] denote Kronecker 
product and matrix vectorization, respectively. Finally, the 
soft-thresholding operator is S\-^ (•) := sign(-) max(|-| — Ai, 0). 
Remark 3 (Elimination of redundant variables): Careful 
inspection of Algorithm [T] reveals that the redundant auxiliary 
variables {F™, F™, M™}™|^" have been eliminated. Each 
smart meter, say the n-th, does not need to separately keep 
track of all its non-redundant multipliers {M™},„gj;^, but 
only update their respective sums S„ [fc] := 2 J^mej '^n'i^]- 
When employed to solve non-convex problems such as (P4), 
ADMM so far offers no convergence guarantees. However, 
there is ample experimental evidence in the literature which 
supports convergence of ADMM, especially when the non- 
convex problem at hand exhibits "favorable" structure |@]. For 
instance, (P4) is bi-convex and gives rise to the strictly convex 
optimization subproblems each time Cc is minimized with re- 
spect to one of the group variables, which admit unique closed- 
form solutions per iteration [cf. (|7)-(|9)]. This observation 
and the linearity of the constraints suggest good convergence 
properties for the D-PCP algorithm. Extensive numerical tests 
including those presented in Section |V] demonstrate that this 
is indeed the case. While a formal convergence proof is the 
subject of ongoing investigation, the following proposition 
asserts that upon convergence, the D-PCP algorithm attains 
consensus and global optimality. For a proof (omitted here 
due to space limitations), see ll20l Appendix C]. 

Proposition 1: Suppose iterates {Cln[k],Pn[k],On[k]}neAf 
generated by AlgorithmU] converge to {ClmPn,On}neJ^- If 
{X, 0} is the optimal solution of (PI), then Qi — Q2 = 
... = Qat. Also,_if \\Vn{^ - PQi - 0)|| < A*, then 
{X = PQ'i,6 = 0}. 

V. Numerical Tests 

This section corroborates convergence and gauges perfor- 
mance of the D-PCP algorithm, when tested using synthetic 
and real load curve data. 

A. Synthetic data tests 

A network of = 25 smart meters is generated as a 
realization of the random geometric graph model, meaning 
nodes are randomly placed on the unit square and two nodes 
communicate with each other if their Euclidean distance is 
less than a prescribed communication range of dc = 0.4; 
see Fig. [T] The time horizon is T = 600. Entries of E 
are independent and identically distributed (i.i.d.), zero-mean, 
Gaussian with variance = 10~^; i.e., ej.t ^ 7\/'(0, cr^). 
Low-rank spatiotemporal load profiles with rank r = 3 are 
generated from the bilinear factorization model X = WZ', 
where W and Z are x r and T x r matrices with i.i.d. 
entries drawn from Gaussian distributions A/'(0, 100/A^) and 
7\/'(0, 100/T), respectively. Every entry of O is randomly 
drawn from the set { — 1,0,1} with Pr(o„,t = —1) = 
PrCon.t = 1) = 5 X 10^^. To simulate missing data, a 
sampling matrix G {0, 1}^^"^ is generated with i.i.d. 
Bernoulli distributed entries o„ t ^ Ber(0.7) (30% missing 
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Fig. 1. A simulated network graph with N = 25 nodes. 
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Fig. 2. Convergence of the D-PCP algorithm for different network sizes. 
D-PCP attains the same estimation enor as the centralized solver. 



data on average). Finally, measurements are generated as 
rn(y) = O (X + O + V) [cf. ©], and smart meter n 
has available the n-th row of Pfi{Y). 

To experimentally corroborate the convergence and optimal - 
ity (as per Proposition [U of the D-PCP algorithm. Algorithm 
[U is run with c = 1 and compared with the centralized 
benchmark (PI), obtained using the solver in ||27| . Parameters 
Ai = 0.0141 and A, = 0.346 are chosen as suggested 
in 1281 . For both schemes. Fig. |2] shows the evolution of 
the global estimation errors ex[k] := ||X[fc] — X||i?/||X||i? 
and eo[k] := ||0[fc] - 0||F/||0||i^. It is apparent that the D- 
PCP algorithm converges to the centralized estimator, and as 
expected convergence slows down due to the delay associated 
with the information flow throughout the network. The test is 
also repeated for network sizes of = 15 and 35 devices, 
to illustrate that the time till convergence scales gracefully 
as the network size increases. Finally, for = 35 and with 
Q[fc] ■— J2n^n[k]/N, Fig. [3] depicts the consensus error 
ec,n[k] '■= II Q„ [fc] — Q[fc] 1 1 J?/ HQ [fc] I j J? for three representative 
smart metering devices. In all cases the error decays rapidly 
to zero, showing that networkwide agreement is attained on 
the estimates Q„[fc] 
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Fis. 3. Evolution of the consensus error. 
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Fig. 4. School and government building load curve data cleansing. 

B. Real load curve data test 

Here, the D-PCP algorithm is tested on real load curve data. 
The dataset consists of power consumption measurements (in 
kW) for a government building, a grocery store, and three 
schools (N = 5) collected every fifteen minutes during a 
period of more than five years, ranging from July 2005 to 
October 2010. Data is downsampled by a factor of four, to 
yield one measurement per hour For the present experiment, 
only a subset of the whole data is utilized for concreteness, 
where T = 336 was chosen corresponding to 336 hour periods. 
For the government building case, a snapshot of the available 
load curve data spanning the studied two-week period is shown 
in blue e.g., in Fig. |4] (bottom). Weekday activity patterns can 
be clearly discerned from those corresponding to weekends, 
as expected for most government buildings; but different, e.g., 
for the load profile of the grocery store in Fig. |5] (bottom). 

To run the D-PCP algorithm, an underlying communication 
graph was generated as in Section IV-AI A randomly chosen 
subset of 30% of the measurements was removed to model 
missing data. For one of the schools and the government 
building data. Fig. |2] depicts the cleansed load curves that 
closely follow the measurements, but are smooth enough to 
avoid overfitting the abnormal energy peaks on the so-termed 
"building operational shoulders." Indeed, these peaks are in 
most cases identified as outliers. The effectiveness in terms of 
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Fig. 5. Government building and groceiy store load cui've imputation, when 
30% of the data are missing. 

imputation of missing data is illustrated in Fig. |3] (identified 
outliers are not shown here); note how the cleansed (gray) load 
curve goes through the (red) missing data points. The relative 
error in predicting missing data is around 6%, and degrades 
to 8% when the amount of missing data increases to 50%. 

VI. Conclusion 

A novel robust load curve cleansing and imputation method 
is developed in this paper, rooted at the crossroads of sparsity- 
cognizant statistical inference, low-rank matrix completion, 
and large-scale distributed optimization. The adopted PCP 
estimator jointly leverages the low-intrinsic dimensionality of 
spatiotemporal load profiles, and the sparse (that is, sporadic) 
nature of outlying measurements. A separable reformulation of 
PCP is shown to be efficiently minimized using the ADMM, 
and gives rise to fully-decentralized iterations which can be 
run by a network of smart-metering devices. Comprehensive 
tests with synthetic and real load curve data demonstrate the 
effectiveness of the novel load cleansing and imputation ap- 
proach, and corroborate the convergence and global optimality 
of the D-PCP algorithm. 

An interesting future direction is to devise real-time cleans- 
ing and imputation algorithms capable of processing load 
curve data acquired sequentially in time. Online adaptive 
algorithms enable tracking of "bad data" in nonstationary 
environments, typically arising due to to e.g., network topol- 
ogy changes and missing data. In addition, it is of interest 
to rigorously establish convergence of the D-PCP algorithm. 
Such results could significantly broaden the applicability of 
ADMM for large-scale optimization over networks, even in 
the presence of non-convex but highly structured and separable 
cost functions. 

Appendix 
Algorithmic Construction 

The goal is to show that [S1]-[S4] can be simplified to the 
iterations tabulated under Algorithm[T] Focusing first on [S3], 
^ decomposes into N ridge-regression sub-problems 



which admit the closed-form solutions shown in Algorithm [T] 
Moving on to [S4], from the decomposable structure of the 
augmented Lagrangian [cf. ©I (|9]l decouples into per-node 
scalar Lasso subtasks (note that Q„ := [qn,i, . . . , qn,T]') 

o„,t[fc + 1] = argmjn |i [y^.t - Qri^Jfe + l]Pn[fc + 1] - o)^ 



T 



+ Ai|o|i|, t = l,...,T 
Sx,{yn,t - q^,t[A: + l]p„[fc+ 1]), t = 1, 
and \^ri\ additional unconstrained QPs 

F™[fc + 1] - F™[A: + 1] = arg min |-(M™ W + M™[fc], F: 

|Q,„[fc+l]-F™|||]} (10) 



^2 + 



which admit the closed-form solutions 

F™[fc + 1] = F™[A; + 1] =;^(M™W + M;r[fc]) 

zc 

+ ^(Qn[A:+l] + Qm[fc + l]). (11) 

Note that in formulating ( fTOl i. F™ was eliminated using the 
constraints F™ = F™ defining Cf- Using (fTTT i to eliminate 
F™[A;] and F™[fc] from (|5]i and (|6]l respectively, a simple 
induction argument establishes that if the initial Lagrange 
multipUers obey M™[0] = -M"[0] = 0, then M;^[fc] = 
-M™[fc] for all k > 0, where n e A/" and m G J„. The 
set {M™} of multipliers has been shown redundant, and (fTTl i 
readily simplifies to 

F;^[fc+1] = F;^[fc+l] = i {Qn[k + 1] + Q™[fc + 1]) . (12) 

It then follows that F™[fc] = F-^[k] for all k > 0, an identity 
that will be used later on. By plugging (fT2] | in (|5]l, the (non- 
redundant) multiplier updates become 



M;"[A;] = M™[fc-l] + -[Q„[fc]-Q„[fc]], 



n e A/", m £ Jn- 
(13) 

If M™[0] = -M;y0] = 0, then the structure of ^ reveals 
that M™[A:] = -MJJJfc] for all A: > 0, where n e M and 
TO e Jn. 

The minimization ^ in [S4] also decouples in N simpler 
sub-problems, namely 

Q„[fc + 1] = arginin|i||ri„(y„ - Qp„[fc] - o„[fc])|l2 
^||Q||^+ J2 (M™W+M:;[fc],Q) 



+1 E {wQ-Kmi + WQ-F-^mi) 



meJr 

arg min 
Q 



\^n{yn - QP«[fc] - 0„[fc])||2 



p„[fc+l] = argmin I ||y„ - Qn[k + l]p - o„ 
p I- 



IPII2} 



^|lQ|||. + (S„[fc],Q) + c J2 



Q 



(14) 
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where in deriving ([14} it was used that: i) M™[A:] = MJ^[fc] 
which follows from the identities M™[A:] = — M™[A:] and 
MJJ*[fc] = -M;:jj[fc] established earUer; ii) the definition 
S„[fc] 2^„gj^^M™[fc]; and iii) the identity F™[fc] = 
F"J/c] which allows one to merge the identical quadratic 
penalty terms and eliminate both F™ [fc] and F^j [k] using ( fT2b . 
Problem (14) is again an unconstrained QR, which is readily 
solved in closed form by e.g., vectorizing Q and examining 
the first-order condition for optimality. 

Finally, note that upon scaling by two the recursions (fTjt 
and summing them over m G Jn, the update recursion for 
S„[fc] in Algorithm [U follows readily. ■ 
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